Pharmacokinetics and Pharmacodynamics of Myelotoxicity

Contents

NEW: K-Fold Validation

PD friberg model - Naive Pooled Inference

The first pharmacodynamic model (see fig 1) we will be looking at was produced by Friberg et. al. which models the life cycle of blood cells as transitioning through different states:

The blood cells in the proliferation state creates new cells at a rate of $k_{prol}$ when at equilibrium and transitions into the first transition state at a rate of $k_{tr}$. The blood cells also transition from one transition state to the next or from the last transition state into circulation at a rate of $k_{tr}$. The circulating blood cells are cleared from the blood stream at a rate of $k_{circ}$. However, as per the friberg paper, I will simplify the system by making $k_{prol} = k_{tr} = k_{circ} = \frac{4}{\mathrm{MTT}}$ where MTT is the mean transit time. To maintain a concentraion of $Circ$ at a constant level $Circ_0$, there is a negative feedback loop.

The drug interacts with the system by inhibiting proliferation. The relation between drug amount and drug effect is assumed to be linear, i.e. $E_\mathrm{drug} = slope*Conc_\mathrm{drug}$

This Produces equations:

$$ \dot{Prol} = k_{prol}(1-E_\mathrm{drug})(\frac{Circ_0}{Circ})^\gamma Prol - k_{tr} Prol \\ \dot{T_1} = k_{tr}Prol - k_{tr} T_1\\ \dot{T_2} = k_{tr}T_1 - k_{tr} T_2\\ \dot{T_3} = k_{tr}T_2 - k_{tr} T_3\\ \dot{Circ} = k_{tr}T_3 - k_{tr} Circ $$

Simulated Data

I initially test fit the model to simulated data before trying to fit to the real data. This shows which methods can reliably infer the parameters. The noise used had both additive and multiplicative parts, i.e. $$ X_\mathrm{Obs}(t_i) = Circ(t_i) + \left(\sigma_{base} + \sigma_{rel} Circ(t_i)^\eta\right) \epsilon_i, $$ where $$ \epsilon_i \sim \mathcal{N}\left(0,\,\sigma^{2}\right) $$ I used the typical PD parameters that Friberg et. al. inferred to create this synthesised data. The PK parameters came from the previous inference that I performed for Docetaxel. This dataset has 100 observations between the times -48 hours and 1440 hours (where dosing occurs at 0 hours).

Initial Optimisation

Before performing bayesian inference, I used optimisation to find the parameters with the highest likelihood, given the data. This should give a good starting point for using bayesian inference.

Before starting the optimisation we can estimate the parameter Circ_0.

I also used 2 different error scores for the optimisation. Sum of squares error and the combined relative and constant gaussian noise log likelihood.

Optimisation results

This is a useful way of estimating the correct parameters and visualising how good the model and estimated parameters fit the data. However, it would also be useful to know how sure we are on these parameter values. To do this we use Bayesian inference and MCMC to sample the parameter space and give a distribution of probable parameter values.

Real Data

The above inference was also be performed on the in vivo data provided by Roche.

MCMC Sampling

To determine the parameter distribution I performed MCMC sampling of the log posterior. However for this system, the noise term is unknown. So I will compare three different models that have the same ODE structure but different log likelihoods to capture differing noises. These noise models will be:

To compare these models I will be using WAIC and LOO-PSIS scores, which estimates the prediction accuracy for out-of-sample datapoints without requiring out-of-sample data. To do this I will need to save the pointwise loglikelihoods, i.e. $\log L(\sigma|x^{obs}_i)$ for each data point, $x^{obs}_i \in X^{obs}$, and each parameter sample, $\sigma$, obtained from the MCMC sampling.

Combined Constant and Relative Noise

If we assume that the distrbution of observations around the true value has both a constant and relative part, i.e. $$ X^{obs} = f\left(t, \theta\right) + \left(\sigma_{base} + \sigma_{rel} f\left(t, \theta\right)^\eta\right) \epsilon, $$ where $ \epsilon \sim N\left(0,1\right)$.

Then the likelihood of our data is given by

$$ \log L\left(\theta,\sigma_{base}, \sigma_{rel}, \eta | X^{obs}\right) = \sum_{i=1}^{n_o}\sum_{j=1}^{n_t}\left[\log L\left(\theta,\sigma_{base}, \sigma_{rel}, \eta|x^{obs}_{ij}\right)\right] $$

where the pointwise log liklihoods are

$$\log L\left(\theta,\sigma, \eta|x^{obs}_{ij}\right) = - \frac{1}{2} \log 2\pi - \log \sigma_{tot,ij} -\frac{\left(x^{obs}_{ij} - f_i\left(t_j, \theta\right)\right)^2}{2\sigma_{tot,ij}^2}$$

and $$\sigma_{tot,ij} = \sigma_{base, i} + f_i\left(t_j, \theta\right)^{\eta_i}\sigma_{rel, i}$$

Constant Gaussian Noise

If we assume that the distrbution of observations around the true value is given by a Gaussian distribution, i.e. $$ X^{obs} \sim f(t, \theta) + \sigma N(0,1), $$

then the likelihood of our data is given by

$$ \log L(\theta, \sigma | X^{obs}) = \sum_{i=1}^{n_t}\sum_{j=1}^{n_t}\left[\log L(\theta,\sigma|x^{obs}_j)\right] $$

where our pointwise log liklihoods are

$$\log L(\theta,\sigma|x^{obs}_{ij}) = - \frac{1}{2} \log 2\pi - \log\sigma_i +\frac{(x^{obs}_{ij} - f_i(t_j, \theta))^2}{2\sigma_i^2}$$

Relative Gaussian likelihood

If we assume that the distrbution of observations around the true value is relative to the magnitude of the true value, i.e. $$ X^{obs} = f\left(t, \theta\right) + \sigma f\left(t, \theta\right)^\eta \epsilon, $$ where $ \epsilon \sim N\left(0,1\right)$.

Then the likelihood of our data is given by

$$ \log L\left(\theta, \sigma, \eta | X^{obs}\right) = \sum_{i=1}^{n_t}\sum_{j=1}^{n_t}\left[\log L\left(\theta,\sigma, \eta|x^{obs}_{ij}\right)\right] $$

where our pointwise log liklihoods are

$$\log L\left(\theta,\sigma, \eta|x^{obs}_{ij}\right) = - \frac{1}{2} \log 2\pi - \log f_i\left(t_j, \theta\right)^{\eta_i}\sigma_i -\frac{\left(x^{obs}_{ij} - f_i\left(t_j, \theta\right)\right)^2}{2\left(f_i\left(t_j, \theta\right)^{\eta_i}\sigma_i\right)^2}$$

Getting pointwise loglikelihoods

Model Comparison

Using these samples and the pointwise likelihoods I can compare the ability of these models to predict out-of-sample data. I will be using the python package Arvix to calculate the WAIC and LOO score.

Comparisons

I looked at the MCMC summary for each of the models, especially $\hat{r}$ to determine whether the chains converged, as well as comparing the models using both LOO-PSIS and WAIC.

$\hat{r}$ shows that when MCMC sampling was performed on constant gaussian noise ("add" in the table) there was no convergence in the chains. This explains the incredibly low LOO score with very high standard error. The multiplicative noise and combined noise models mostly converged and performed better in the LOO analysis with combined noise performing the best.

WAIC also confirms the results from LOO.

K-Fold Validation

For both the LOO and WAIC calculations a warning was given (LOO: Estimated shape parameter of Pareto distribution is greater than 0.7; WAIC: For one or more samples the posterior variance of the log predictive densities exceeds 0.4). This is likely due to one or more of the observations being highly influential in the fit. Thus, to confirm the results of the model comparison, I will use a more computationally intensive method, K-fold validation, that will be less impacted by these influential points.

We start this process by splitting up the data into $K$ subsets, $X^{Obs}_k$.

Then the model needs to be fit to each set $X^{obs}_{-k} = \cup_{j \neq k} X^{obs}_j $ thus providing the posterior distribution, $P(\theta | X^{obs}_{-k})$.

The out-of-sample predictive power of the model for each $x^{Obs}_i \in X^{Obs}_k$ is then given by the expected log pointwise predictive density,

$$elpd_i = \log \frac{1}{S}\sum_{s=1}^{S}P(x^{Obs}_i | \theta^{-k, s})$$

where $\theta^{-k, s}$ is the $s$-th parameter sample from the MCMC sampling done with the $X^{Obs}_{-k}$ dataset and $S$ is the total number of parameters sampled. The validation score is then the sum of $elpd_i$ for all $i$.

We can see from $\hat{r}$ that combined noise was the only model that could converge the MCMC chains (only 1 parameter for 1 of the subsets was not close to convergence. Where as the multiplicative noise and constant noise had large difficulties in converging the chains.

This lack of convergence and very small likelihoods cause numerical issues in the code meaning a K-fold comparison could not be made.